Datamapper: A Documentation Generator for SAS Metadata

نویسنده

Lei Zhang

چکیده

SAS metadata is the data about SAS datasets, which is critical to the effective manipulation and analysis of SAS data. The SAS metadata exploration traditionally proceeds as ad-hoc programming on SAS Dictionary tables, which is inadequate, inefficient, and sometime complicated. Based on SAS ODBC functionality and hypertext techniques, a documentation generator called Datamapper is being developed to automatically create a set of structured hyperdocuments (called datamap) for the SAS metadata. The datamap allows users to interactively access essential piece of meta-information about SAS data objects or elements and traverse between them based on the established hyperlinks while avoiding ad-hoc SAS metadata programming. In this paper I first present a conceptual model for the SAS metadata and then the hyperdocument architecture of the datamap designed with OOHDM (Object Oriented Hypermedia Design Methodology), and finally discuss the development and implementation of the Datamapper. INTRODUCTION SAS metadata is the data about SAS datasets. It is a very useful and indispensable resource for SAS program development and maintenance. A SAS program is a complicated software artifact that executes on multiple SAS datasets, and both developing it and maintaining it require that users have accurate, update-todate meta-information about SAS datasets, such as SAS libraries, structures of datasets, variables, formats, and the relationships between them. For many large clinical trial projects, the meta-information involves in multiple data libraries with dozens of datasets and hundreds of SAS variables. However, producing and keeping the SAS metadata up to date can be an expensive and time-consuming endeavor. Present approaches to SAS metadata are often inadequate to meet the diverse needs of both technical and non-technical users who often consult it on a day-to-day basis. In this paper, I introduce a tool called Datamapper that can automatically generate a collection of structured hyperdocuments called datamap for SAS metadata. Three unique features of the datamap are (1) that the datamap are created or updated mechanically and therefore never out of date. (2) that hyperlinks in the datamap are effectively created for a variety of data objects such as libraries, tables, formats, and variables. and (3) that hidden relationships and dependences under the SAS datasets are mined, recovered and reflected in the generated datamap. Therefore, surfing the datamap can often replaces ad hoc querying and programming about SAS metadata. Datamapper is a general-purpose utility tool for SAS metadata documentation, but many facets of its design are influenced by the goal of providing high quality and extremely effective metadata documentation for SAS datasets in clinical trial projects, which require metadata documentation be especially precise, accurate, and easy to use. This property is particularly important because it is essential that SAS programmers involved with the clinical trial projects be able to obtain all of the information that they need when they need it. Presenting SAS metadata as hyperdocumentation raises a number of interesting problems in information retrieval and hypertext techniques. The evolving nature of SAS datasets and the diverse demands on metadata documentation presents a very challenging environment. This is made more challenging by the variety of data objects that must be integrated into a coherent set of hyperdocuments for the purpose of effective browsing or navigation. This paper is organized as follows: Section 2 describes the problem domain and user requirements for SAS metadata documentation. Section 3 presents a conceptual model for the Datamap. The model is presented with UML class diagram. In Section 4, OODHM is used to design the datamap navigational model. Both navigational space model and navigation structure model in UML are provided. In Section 5, HTML templates are used to present the model-based datamap. Section 6 describes the design and implementation of Datamapper. Finally, conclusion and plans for future work are presented in Section 7. THE PROBLEM DOMAIN The typical working environment for most SAS programmers includes the SAS System, a couple of local or remote platforms, dozens of datasets stored in multiple directories, and little accessible metadata documentation about the data. The SAS System provides primitive support for the metadata of SAS datasets such as Dictionary tables, so a SAS programmer can write a piece of SAS codes or macros with Proc SQL and/or Proc

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Editing SAS Metadata – Automated From CSV Files Using XML String in SAS Data Integration Studio

The SAS Metadata Server® introduces a new world to clinical data programmers. It is a storage centre to store information about every single object there is in the SAS System®. Not only the table level and column level attributes such as SAS table labels and column lengths, extended attributes are also built in and extensible to keep valuable business data in this centralized location. To acces...

متن کامل

Static Verification for Code Contracts

The Code Contracts project [3] at Microsoft Research enables programmers on the .NET platform to author specifications in existing languages such as C# and VisualBasic. To take advantage of these specifications, we provide tools for documentation generation, runtime contract checking, and static contract verification. This talk details the overall approach of the static contract checker and exa...

متن کامل

Practical Methods for Creating CDISC SDTM Domain Data Sets from Existing Data

Creating CDISC SDTM domain data sets from existing clinical trial data can be a challenging task, particularly if the database was not designed with the SDTM standards in mind. A key step in the process involves determining which of the STDM domain datasets need to be produced for submission and then determining what conversion process will be necessary to produce them from the existing data. A...

متن کامل

Using bio.tools to generate and annotate workbench tool

Workbench and workflow systems such as Galaxy, Taverna, Chipster, or Common Workflow Language (CWL)-based frameworks, facilitate the access to bioinformatics tools in a user-friendly, scalable and reproducible way. Still, the integration of tools in such environments remains a cumbersome, time consuming and error-prone process. A major consequence is the incomplete or outdated description of to...

متن کامل